76 ◾ Bioinformatics
Add the following to the end of the file:
export PATH=”your_path/bowtie2”:$PATH
Do not forget to change “your_path” with the right path on your computer. Save the file
and exit. You may need to restart the terminal or run “source .bashrc” to make the change
active. Then, you can enter “bowtie2” on the terminal. If Bowtie2 is installed and its path
was set, help screen will be displayed.
Before read mapping, we need to use “bowtie2-build” command to index the FASTA
sequence of the reference genome. Enter “bowtie2-build” on the command line of the ter-
minal to display the help screen that shows the usage and options. The general syntax is as
follows:
bowtie2-build [options] <reference_in> <ebwt_outfile_base>
The “bowtie2-build” command requires a FASTA file of a reference genome as an input
and a prefix string which is added as a prefix to the file names of the index. The following
command indexes the human genome for Bowtie2. Before running the command, make
sure that the current working directory is a one-level out “refgenome” directory, where we
downloaded the human genome.
bowtie2-build \
--threads 4 \
refgenome/GRCh38.p13_ref.fna \
refgenome/bowtie2
The indexing may take around 25 minutes using four processors on a computer with 32G
of memory. The “bowtie2-build” command generates six index files prefixed with the pre-
fix string provided for the command. Pre-built indexes for some organisms can also be
downloaded from the official Bowtie2 website.
After indexing the reference genome, we can use “bowtie2” command to align the
paired-end reads and to generate SAM file:
bowtie2 -x refgenome/bowtie2 \
-1 data/SRR769545_1.fastq.gz \
-2 data/SRR769545_2.fastq.gz \
-S sam/SRR769545_bowtie2.sam
Instead of “-S” option to generate a SAM file, we can use “-b” option to generate a BAM
file. To learn more about Bowtie2’s options, enter “bowtie2” on the command line of the
Linux terminal.
2.3.2.3 STAR
STAR, which stands for Spliced Transcripts Alignment to a Reference, is a fast read aligner
developed to handle the alignment of massive number of RNA-Seq reads. Its alignment